Fast Offline Policy Optimization for Large Scale Recommendation

نویسندگان

چکیده

Personalised interactive systems such as recommender require selecting relevant items from massive catalogs dependent on context. Reward-driven offline optimisation of these can be achieved by a relaxation the discrete problem resulting in policy learning or REINFORCE style algorithms. Unfortunately, this step requires computing sum over entire catalogue making complexity evaluation gradient (and hence each stochastic descent iterations) linear size. This calculation is untenable many real world examples large systems, severely limiting usefulness method practice. In paper, we derive an approximation algorithms that scale logarithmically with Our contribution based upon combining three novel ideas: new Monte Carlo estimate policy, self normalised importance sampling estimator and use fast maximum inner product search at training time. Extensive experiments show our algorithm order magnitude faster than naive approaches yet produces equally good policies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-Domain Recommendation for Large-Scale Data

Cross-domain algorithms have been introduced to help improving recommendations and to alleviate cold-start problem, especially in small and sparse datasets. These algorithms work by transferring information from source domain(s) to target domain. In this paper, we study if such algorithms can be helpful for large-scale datasets. We introduce a large-scale cross-domain recommender algorithm deri...

متن کامل

Cascading Bandits for Large-Scale Recommendation Problems

Most recommender systems recommend a list of items. The user examines the list, from the first item to the last, and often chooses the first attractive item and does not examine the rest. This type of user behavior can be modeled by the cascade model. In this work, we study cascading bandits, an online learning variant of the cascade model where the goal is to recommend K most attractive items ...

متن کامل

Fast Large-Scale Spectral Clustering by Sequential Shrinkage Optimization

In many applications, we need to cluster largescale data objects. However, some recently proposed clustering algorithms such as spectral clustering can hardly handle large-scale applications due to the complexity issue, although their effectiveness has been demonstrated in many previous work. In this paper, we propose a fast solver for spectral clustering. In contrast to traditional spectral cl...

متن کامل

An Efficient Parameter-Free Method for Large Scale Offline Learning

With the rapid growth of computer storage capacities, available data and demand for scoring models both follow an increasing trend, sharper than that of the processing power. However, the main limitation to a wide spread of data mining solutions is the non-increasing availability of skilled data analysts, which play a key role in data preparation and model selection. In this paper we present a ...

متن کامل

A limited memory adaptive trust-region approach for large-scale unconstrained optimization

This study concerns with a trust-region-based method for solving unconstrained optimization problems. The approach takes the advantages of the compact limited memory BFGS updating formula together with an appropriate adaptive radius strategy. In our approach, the adaptive technique leads us to decrease the number of subproblems solving, while utilizing the structure of limited memory quasi-Newt...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i8.26158